Date: Mon, 3 Feb 92 15:09:01 -0800
From: Jeff (Grambo) Graham <graham@TEETOT.ACUSD.EDU>

                STRAND SOFTWARE TECHNOLOGIES, INC.
                           STRAND88

                     TECHNICAL DESCRIPTION

                       Buckingham Release

                          July 1990


SCOPE OF THIS DOCUMENT

This document provides a short technical description of STRAND88. Readers who
are unfamiliar with the ideas of concurrency should read the companion
document STRAND88 Technical Overview first This document is not intended to
be a management level description.

STRAND AND STRAND88

The two terms Strand and STRAND88 have precise and different meanings. Strand
is a language definition that is, by and large, in the public domain, such a
definition covers only the language, its syntax and semantics but does not
cover the environment or libraries.  STRAND88 is a product, currently the
only available implementation of Strand, which includes an abstract machine
emulator, compiler, lint, environment and associated services and libraries.
STRAND88 has had a number of releases, the current release is called
Buckingham, and is the second major release, the previous release was called
Admiralty. Within a major release, minor updates occur, for example to
support new hardware, these are referred to by build number. The initial
Buckingham release has build number B.6. Very little in this document has
been changed to reflect the upgrade from Admiralty to Buckingham.

Before describing the features of Strand it Is important to discard many of
the conventional components of sequential languages. In a Strand system there
are no stacks and hence no stack frames, variables behave quite unlike
variables in C or Pascal, the concept of flow of control does not exist, and
within some constraints, the semantics of a program are independent of the
order that it is written.

STRAND FEATURES

Strand is a language that has parallel semantics, by this we mean that
statements in Strand are executed concurrently, not in the order they occur
in a procedure definition. Each statement, more accurately each procedure
call, is executed in a separate process, on a particular processor. A process
in this context is a far cry from a large scale process. Strand processes
differ from other forms of process in the following ways. Firstly they are
small, consuming only a few words of memory, secondly they only ever perform
one operation, that is to spawn zero or more new processes, once this is
performed the process is complete and can be recycled. Lastly they may have
to wait for the correct set of inputs before they can execute

STRAND PROCESSES

A process has a set of arguments each one of which may be a data value
(simple or structured) or a variable. Data structures may be referenced by
more than one process, or one process argument may be a part of another data
structure referenced in some other process. Variables may also be referenced
by more than one process argument, that is they may be shared. A running
Strand program therefore consists of a network of processes, connected by
shared data structures and variables. There are no global structures. An
alternative, and equally valid view, is that of a set of data structures
being constructed in core by a set of processes, each process waiting to add
its part of the edifice

One can represent a process as a term of the form:

        mod1:start(A1,A2,...Ak)

That is it has a name, start, is defined in a module, mod1, and arguments A1
to Ak. We say that the number of arguments, k, is the arity of the process.
Normally a process name is written in text with the arity appended, for
example start/4, if start has an arity of 4. The reason for this is that
Strand procedures with the same name but different arities are semantically
distinct.

When a process executes, we use the term reduces, it may spawn new processes,
some of which may be inbuilt procedures called kernels. Kernels are the
primitive operations of Strand and perform various data manipulation
activities.

VARIABLES

Variables in Strand are single assignment, and as mentioned in the previous
paragraph may be shared, i.e. part of more than one process. Variables can be
assigned a value. This replaces the variable with the value assigned, it is
not possible to update this replacement once assignment has occurred
Assignment is only performed by kernels, the most common being the assignment
kernel itself. If an attempt is made to assign to a non-variable then a
system error is generated. The effect of an assignment is that all process
arguments that previously referenced a variable now reference the value
assigned.

SHARED VARIABLES

Processes that share a variable may either be readers or writers, i.e. they
may be waiting to use the value, or to write it. It is clear from the
behavior of variable assignment that only one writer should ever exist for
each variable and that multiple writers will lead to a system error. We are
describing the lowest level semantics of Strand here, multiple writers are
obviously an important programming technique, and can be achieved by higher
level techniques.

MULTIPROCESSORS

Since the value assigned to a variable can not be changed it is always
unambiguous to make a copy of the value on another processor. Therefore the
situation of single writer and multiple readers where the readers are on
different processors is exactly the same as if they were on the same
processor. When an assignment is made, copies of the assigned value is
communicated to the other processors. It merely takes a little longer to pass
through the communications subsystem. To make one reader reduce on another
node that reader process needs to be spawned on the desired node No other
changes are necessary.

REDUCTION EXAMPLE

Suppose we had a process, called start, that spawned three processes, one
writer, rite, and two readers, reeda and reedb, that share a variable 'X',
then we can imagine the following series of events unfolding as execution
proceeds (refer to diagram)

        -------------------------------------------------------------
        | Time 0                start()                             |
        -------------------------------------------------------------
                reduce start/0
        -------------------------------------------------------------
        | Time 1                rite(X)  -----------\               |
        |                       reeda(X) ------------> Variable: X  |
        |                       reedb(X) -----------/               |
        -------------------------------------------------------------
                reduce rite/1
        -------------------------------------------------------------
        | Time 2                X := 42; -----------\               |
        |                       reeda(X) ------------> Variable: X  |
        |                       reedb(X) -----------/               |
        -------------------------------------------------------------
                assign 42 to X
        -------------------------------------------------------------
        | Time 3                reeda(X) ------------> 42           |
        |                       reedb(X) -----------/               |
        -------------------------------------------------------------
                reduce reeda/1 and reedb/1
        -------------------------------------------------------------
        | Time 4                                                    |
        -------------------------------------------------------------

At time 0 a single  process, start(), exists. Reducing this results in three
processes, rite/1, reeda/1 and reedb/1. One process is then selected for
reduction, we will assume it is rite/1, reducing this results in an
assignment kernel being spawned [In the actual implementation kernels are
normally executed 'in-line' for efficiency, the description here is a
conceptual model]
At time step 2 the assignment kernel, here shown as the assignment operator
':=' assigns 42 to the variable X. The two readers can now be reduced and
read the value 42. Note that the reader processes can not reduce until the
value is available. A process that is in this state is said to be suspended,
in this case suspended on the variable X. Suspended processes wait quietly
for the variable or variables they are waiting for to be assigned.

The reduction of a process is described by a procedure which is comprised of
a set of clauses.
Clauses have the following general form:

        H :- G1, G2,...Gm|B1, B2,...Bn                          m,n >= 0

Where H is the clause head, ':-' is known by convention as the implication
operator, the Gs constitute the clause's guard, '|' is the commit operator
and the Bs are the clause's body. The clause definition is terminated by a
period.
Clauses define basic process actions in the following way. The head and guard
together specify a set of preconditions under which a process may be reduced.
A clause of the form

        H :- G1, G2,...Gn|B1.

specifies that if its preconditions are met then the process changes to a new
state as specified by B1. A general clause of the form

        H :- G1, G2,...Gm|B1, B2,...Bn

specifies that if the preconditions are met then the process changes to a
state as specified by one of the body calls and simultaneously spawns or
forks processes as specified by the other body calls.

CLAUSE STRUCTURE, PROCEDURE DEFINITIONS AND MODULES

The clause head describes a state which is used to match against the state of
a process. it has the form

        H(T1, T2,...Tn)                         n>=0

The head consists of the name H (a string beginning with a lower case
character) of the procedure to which the clause belongs and the argument list
T1,T2,...Tn. The number of elements in the formal argument list is the arity.

The clause guard contains a sequence of guard kernel calls. A guard kernel
call invokes a predefined test operation known as a guard kernel which is
applied to the state of a given process. The language provides a set of guard
kernels. It is also possible for the programmer to define user guard kernels
using the foreign language interface . A guard kernel call has the form
        G(T1, T2,...Tn)                         n>=0

A guard kernel call consists of the guard kernel name G (a string beginning
with a lower case character) and an argument list.
The clause body contains a collection of body calls and assignments (see
below). A body call has the form

        B(T1, T2,...Tn)                         n>=0

Where B is the name of the procedure and n its arity and T1, T2,....Tn is its
argument list. A body call can be one of:

-       a [body] kernel call. This invokes functionality provided by Strand the
        language
-       a procedure call [to a procedure defined in the same module (see below)]
-       a system call [to the environment of the form C!]
-       an external procedure call [of the form M:C]
        (an external procedure call contains two components - a module M and a
        call C to a procedure in the module)
-       an addressed call [of the form C@N]
        (an addressed call contains two components - an address N and an
        indirect call C. The latter can be a system call or an external
        procedure call).

A procedure definition is formed by grouping together clauses the heads of
which have the same name and arity.

The code of a Strand program is composed of one or more modules. A module is
a collection of procedures, the definition of compilation mode and an exports
list. The exports list is a specification of the procedures contained in the
module which may be referenced by procedures from other modules.

Generally, the structure of a Strand module can be viewed as:

        specification of compilation mode
        specification of exports list

        procedure1

        procedure2 ...

The name of a module is determined by that of the file (with a .sam
extension) from which it is loaded, and therefore must conform to the host
operating system rules. The .sam suffix of compiled module files is assumed,
and need not be specified.

Modules are developed using an external editor to write Strand source code
files which conform to the structure described above. The source file must be
compiled by the Strand compiler before the procedures they contain can be
used. Strand source code files have the suffix .std. Compiled module files
have the suffix .sam.

A module, once loaded, is a first class Strand  data type and can be
manipulated by Strand programs in the same way as data types which can be
represented directly in Strand source code.

DATA STRUCTURES IN STRAND

Tuples

Tuples are denoted by a collection of data items enclosed in braces i.e.{}
and separated by commas. The arity of a tuple is the number of elements it
contains. For example, the following tuple may represent a building:

        {house, 77,{bedrooms, 3}, Owner}

In this case the arity is 4 and the elements are: the string house, the
number 77, the tuple {bedrooms, 3} and the variable Owner.

Lists

Lists are used to represent sequences of data items. A list is denoted by:
[Head|Tail] where both Head and Tail are terms. The head of a list represents
the first data item in the sequence; the tail represents the rest of the
sequence. An empty list (i.e. one that has no head or tail) is denoted by:[].
For example

        [house, 77, {bedrooms, 3}, Owner]

denotes a list which contains the same data as the example tuple shown above.

It must be understood that the tuple shown above and the list described here
are completely different data structures even though they contain the same
data items.

It is often valuable to be able to denote the continuation of a sequence
without mentioning its elements explicitly, e.g.

        [house, 77 | Rest]


AN EXAMPLE STRAND PROGRAM

This section provides a complete Strand program which allows illustration of
most of the terminology introduced previously.

% This module 'member' implements the procedure member/3 which is
% used to determine whether a data item is present in a list.

% Its format is - member(Item?, List?, Result^) - where Item and
% List are input arguments and Result is an output argument which
% is assigned the value 'true' or 'false' depending on whether Item
% is present in List.
% If List is not a list, Result is the value [] and an error is
% signalled.

-compile(free).% compilation mode
-exports([member/3]). % exports list

member(_,[],Result) :- Result := false    % C1 Item not found in
                                          % List.
member(X,[X|_],Result) :- Result := true  % C2 Item is first                
                                          % element in List.
member(X,[Y| Rest],Result) :-             % C2 Item not first
        X =\= Y|                          % element in List
                                          % so search rest of list.
member(X, Rest, Result).
member(_, IncorrectInput, Result) :-      % C4 for debugging. Trap
        otherwise|                        % cases where second arg
        Result := [],                     % is not a list. Send error
        error('arg not a list '           % message and set Result to
                ,IncorrectInput)!.        % empty list to allow other
                                          % process waiting for
                                          % Result to react.

Notice the following points:
1.      C1 and C2 and C3 all use the anonymous variable as a formal variable to
        indicate they do not care about the value of a formal argument.

2.      C3 uses the formal variable Rest as an actual argument in the recursive
        procedure call to member.

3.      C3 makes a guard kernel call to the guard kernel '=\=' and C4 makes a
        guard kernel call to the guard kernel 'otherwise'.

4.      C4 makes a system call to notify the occurrence of an error.

5.      C1, C2 and C3 all use a literal structure as a formal argument in each
        case this structure is used to match against a process argument to
        determine whether the clause can be used to reduce the process (matching
        is covered in detail in the next section).

6.      C1 and C2 both use the assignment operator ':=' to assign a literal
        value to a formal variable. (Assignment is covered in more detail in the
        next section.)


STRAND PROGRAM EXECUTION

A Strand computation corresponds to a pool of concurrently executing
processes Recall that each process can be represented by a term of the form:

        p(T1, T2,...Tn)                 (n>=0)

Where p/n identifies a procedure used to execute the process by its name and
arity and T1,...., Tn are the process arguments. The arguments are data
structures that comprise the state of the process. Computation proceeds by
repeatedly removing a process from the pool and attempting to reduce it.
Notice that this description does not preclude the possibility of
simultaneously attempting to reduce more than one process.

A reduction attempt involves finding a clause from the associated procedure
definition whose head matches the process state and the execution of the
clause's guard. If the preconditions specified by a given clause head and
guard are satisfied then the process commits to that particular clause. There
may be more than one clause whose head matches the process state, in this
instance the first one matching would be selected. It is important that the
programmer realize this since it imposes a requirement for an appropriate
degree of specificity in the definition of the preconditions of clauses. It
also serves to explain the need for guards in addition to the matching
process.

Once a process commits to a clause it is reduced. Recall that reduction
involves only three kinds of action: termination, changing state and forking.

MATCHING

Strand employs a matching algorithm to compare a clause head with a process
state. A process state matches with a clause head if the name of the
procedure associated with the process is the same as the name of the clause
head and the number of process arguments is the same as the number of the
clause's formal arguments and the process arguments match with the formal
arguments.

Strings match if the two strings are identical - they contain exactly the
same sequence of characters.

Numbers only match if they are identical. In addition integers do not match
against their real counterparts. It is normally dubious practice to match two
real numbers.

Variables match against variables.

Structures match if they are of the same size and type and the elements they
contain match. Thus a tuple can only match against a tuple of the same arity
and a list can only match against a list as long as the specified elements
match.

Although the matching algorithm may bind variables in the clause head, it may
not assign values to variables in the process state.

An important concept for Strand programming is data flow synchronization
where matching is suspended until sufficient data is available to determine
the outcome of a match. In essence, it is the availability of data that
determines when processes are able to execute

GUARD EXECUTION

We have previously mentioned guard kernels and guard kernel calls. A guard
kernel call appears in the guard of a clause and invokes the test operation
specified by the guard kernel. These tests are built into Strand and can be
used to perform type checking, arithmetic comparison and term comparison.
Like matching they serve to constrain when a clause can be used to reduce a
process.

Any number of guard kernel calls may appear in the guard of a clause. They
are executed from left to right textually after matching is complete and has
succeeded. Each test may succeed, fail or suspend. If all tests In a guard
succeed then the guard as a whole is said to succeed; if any test fails or
suspends then the guard as a whole immediately fails or suspends
respectively. Tests generally suspend if they encounter a variable.

The guard tests fall into five basic categories. These are:

        data type tests 
        comparisons
        assignment tests (does the variable have a value or not)
        failure of match and guard tests in all other clauses
        system state tests.

ASSIGNMENT
Assignment is used to generate a value for a variable that appears in a
process argument or to generate a value for a local variable. Remember a
variable matches against a variable.

Assignment is represented in program text as follows:

        Variable := Value

Assignment calls can only appear in the body of a clause.

One way to visualize the assignment operation is to think of a variable as a
box. The box has a label which corresponds to its name and it can hold any
data structure. The effect of executing an assignment is to place a value in
this box; any part of a program which refers to the variable via its label
can access its value.

Although this operation is simple there are some subtler points that must be
appreciated. Once a value has been assigned to a variable it cannot be
removed or changed. Variables which have this property are often called
single assignment variables.

Since it is possible to assign anything to a variable it must be possible to
assign a variable to a variable. For example consider the assignment

        Thing1 :=  Thing2

After this assignment, any part of the program that refers to thing1
automatically has access to the 'box' associated with Thing2. Another way to
think of this is that Thing1 becomes an alias (i.e. another name) for Thing2.

Finally, a 'box' cannot be placed inside itself; indeed anything containing
a box cannot be placed inside that 'box'. Thus the assignments Foo := Foo and
Foo := [Baz, Bar, Foo] are defined to be illegal. Another way of putting this
is that circular references and circular terms cannot be manipulated in
Strand.

BODY CALLS

The distinction between a body kernel call and a procedure call is simply
that body kernel calls invoke functionality specified by the language,
whereas procedure calls refer to procedures defined by the programmer.

The exact details of what takes place given a body kernel call is
implementation specific. However, in order to retain the simplicity of the
model presented so far it is sufficient to say that the system attempts to
'reduce' the 'process' associated with a body kernel immediately without
placing it in the process pool. It is only if the 'reduction' attempt
suspends that a process is placed in the process pool.

Now let us look in more detail at procedure calls. Recall that once a process
has committed to a particular clause there are only three kinds of action
that can be taken to reduce the process. We have already seen how the general
form of the body is related to each of these actions. All we need to do now
is describe in slightly more detail how a procedure call is related to a
change of state in a process and the spawning of further processes.

One of the procedure calls in the body causes the process to change state if
we recall that a process can be represented by a term of the form

        p(T1, T2,... Tn)                n>=0

and a procedure call has the general form

        b(T1, T2,... Tn)                n>=0

It is easy to see that the process is transformed so that it is associated
with the procedure definition that has the same name and arity as the
procedure call and the process arguments now become those specified by the
actual arguments of the procedure call. The data structures that become the
process arguments are comprised of the literal values that appeared in the
call plus the values that any variables might have received (which might
themselves be variables) through matching or assignment and any local
variables introduced in the clause that appear in the call. When the process
has been thus transformed it is placed back in the process pool to be further
reduced.

STRAND88 THE PRODUCT

STRAND88 as a product consists of more than just a strand compiler and
abstract machine emulator, it also provides an interactive shell and an
environment to aid software development.

The Strand Abstract Machine emulator, or SAM, is the part of the system that
performs the low level emulation. This is designed to run on a variety of
machines and enables compiled Strand code to be run on any SAM
implementation. When strand is started by typing 'st' to the operating system
prompt, it is the SAM that is invoked.

The services provide functions such as intermodule calling, I/O, and
monitoring computations.

The shell provides an user interface and starts applications, compilations
and other utility programs.


EXAMPLE SHELL SESSION

Strand is started by typing the following command in response to the OS
prompt:

        st

The first thing you see when Strand starts running is something like this:

        Welcome to STRAND88(TM). ...
        Copyright (c) Artificial Intelligence Limited ...

        1 node invoked.
        1>

The 1> is the STRAND88 Shell service prompt.

Suppose that the file demo1.std has the following contents:

        -compile(free).
        -exports([min/3]).

        min(X,Y,Z):-
                X < Y |
                        Z := X.
        min(X,Y,Z):-
                X >= Y |
                        Z := Y.
        min(X,Y,Z):-
                otherwise |
                        Z := error("illegal values",X,Y).


Lets start by calling the min procedure that is defined in the demo1 module.
We must tell Strand where the definition of the call can be found, the call
we want to use and any arguments. Normally it is necessary to compile a
module before it can be called, but we can assume that demo1 has been
compiled already.

        1>demo1:min(23,45,Z)

Strand responds with:

        [started,1]
        [exited,1]
        2>

Strand has started the correct definition of the min call to use, reduced it
to its kernel components and executed them. The started and exited text shows
you that the call typed on input line 1 has been started and exited The 2> is
Strand's prompt for another input, this time on the next line.

For simplicity, the rest of this document ignores strand's response of
started and exited.

The minimum of 23 and 45 has been assigned to the variable Z but we have not
seen the result yet. This variable is remembered by the shell, and whenever
the shell sees Z in the future it will substitute the appropriate value.
There are a number of ways in which we can examine the value of such shell
variables. The easiest mechanism is to use the watch facility. This requests
that the value assigned to a variable be displayed when that value is
assigned. In other words the result of the watch request may not be seen for
a long time, if the assignment is not performed for a while. If a value has
already been assigned then it is printed immediately the request is made. A
watch request is made by using the postfix operator '^'.

        3>Z^
        Z == 23
        4>

Shell variables can also be assigned directly:

        4>X := 123
        5>

Now we can call min again, this time using the new shell variable X, and
requesting a watch on a new variable Z2:

        7>demo1:min(34,12.3,Z2^)
        Z2 == 12.3
        8>

The demo2 module contains definitions for a procedure that recursively
generates a list of integers. The module looks like this:

        -compile(free).
        -exports([gen/2]).

        gen(0,X) :-
                X := [].
        gen(N,X) :-
                N > 0 |
                X := [N|X1],
                N1 is N - 1,
                gen(N1,X1).
        gen(N,X) :-
                otherwise |
                        X := error("not an integer",N).

This module contains the procedure that defines the gen procedure, which has
three clauses. The first clause simply assigns the variable X to an empty
list if the first parameter is zero.

The second clause is more complicated. This definition may be used when the
first clause cannot be used. This condition is checked with the "N>0" guard.
When Strand uses this definition, it makes X a list whose first element is
the value of N and whose second element is an unassigned variable called X1.
Another new variable, N1, is created which is one less than the value of N.
The definition concludes by calling gen again, this time with the new
parameters. By repeated recursion the list is extended until N is zero, when
the first clause is used to terminate the list correctly.

The third clause is invoked if the first parameter is not an integer, and
assigns the output parameter to an error structure.

To generate a list of 10 integers, and see the result, we can type in the
following:

        9>demo2:gen(10,I^)
        [I/1, 10]
        [I/2, 9]
        [I/3, 8]
        [I/4, 7]
        [I/5, 6]
        [I/6, 5]
        [I/7, 4]
        [I/8, 3]
        [I/9, 2]
        [I/10, 1]
        [I/tail,[],'at index', 10]
        10>

Procedures such as gen/2 that create a list, or stream, are referred to as
producers, and normally are coupled with procedures that accept a list as
input, or consumers. The concept of a stream is just a metaphor for a list
that is not complete.
A simple consumer is sum:

        -compile(free).
        -exports([sum/2]).

        sum(L,R):-
                sum(L,0,R).

        sum([],Accum,R):-
                R := Accum.
        sum([H|T],Accum,R):-
                integer(H) |
                        Accum1 is Accum + H,
                        sum(T,Accum1,R).
        sum(L,_,R):-
                otherwise |
                        R := error("invalid value",L).

The first feature to note is that this file contains two different
procedures. Both have the same name, but different numbers of arguments
(arity). The two procedures are sum/2 and sum/3, these are totally distinct
as far as the Strand system is concerned. In practice, of course, it might be
better to use different names, but that would miss the point. This example is
similar to the previous example but uses a list construct in the head of
sum/3 to decompose the list.

Try calling sum on a small list:

        10>demo3:sum([3,4,5],R1^)
        R1 == 12
        11>

We can tie together the producer and consumer by linking the two procedures.
A module that does this look like:

        -compile(free).
        -exports([go/1]).

        go(N):-
                integer(N) |
                        demo2:gen(N,L),
                        demo3:sum(L,R),
                        output(R).
        go(N):-
                otherwise |
                        display_n1("Error: parameter should be an integer")!.

        output(R):-
                data(R) |
                        display("The total is: ")&
                        display_n1(R)!.


FOREIGN LANGUAGE INTERFACE

Strand's foreign language Interface provides a method for including new guard
and body kernels within the Strand SAM, thus extending the set of available
primitive calls. These "foreign language" kernels can be called from Strand
programs compiled and run on the extended SAM, independent of the presence or
absence of the Strand environment. Foreign language code (i.e. non Strand
definitions) must be supplied to implement the kernel functionality. This
code is called when kernels are executed, either during guard testing or
activation of a clause body. The foreign code has access to the kernel
arguments and may also build new Strand data for return as output. A number
of foreign kernels are supplied with Strand (see the Strand Reference
Manual).

The foreign language interface also enables users to extend the range of
Strand datatypes available in the SAM. User datatypes can be used to store
foreign language data structures for use by foreign code. Alternatively, they
can be used to optimize access to and manipulation of data which could also
be encoded using Strand structures such as tuples or lists.

The foreign language interface provides one more useful function. It enables
external interrupt signals to be caught and handled without endangering the
correct functioning of the SAM. The interrupt functionality is integrated
with the kernel interface. It provides for enabling/disabling of interrupt
signals, notification of interrupt signals to the SAM, registration of
interrupt handlers to be called by the SAM and a stream interface between the
handler and a Strand 'interrupt consumer' process.

A library definition file is a text file containing declarations of foreign
kernels in a standard format. A declaration may spread over more than one
line, but each new entry must start after a new line. It may also contain
library file references, declarations to include kernel declarations from
other library definition files. When malis (the program that generates the
foreign kernel code) is called to build the interface code it must be
provided with the name of a single libdef file as argument. It reads the file
line by line recording each kernel declaration. When it reads a library file
reference it inserts lines from the library definition file referred to in
the declaration in place of the reference and continues processing. Once all
include and kernel declarations have been read and noted malis generates the
interface code used to invoke the foreign kernels declared in all the files
it has read. N.B. malis will follow up nested library file references i.e. a
library file included by a reference may contain other references which malis
will look up and include.

A foreign kernel declaration has the following format:

        Number Type Strand_name Foreign_name Mode Language

The Mode is a list of argument mode specifiers, one per kernel argument,
where a mode specifier is a symbol describing the argument. Up to 32
arguments may be defined for a foreign goal, although the kernel may have
none. The argument mode specifiers are written inside round brackets,
separated by commas. Each argument mode specifier indicates whether the
argument specified in the foreign kernel is an input or output argument. A
question mark (?) denotes an input argument, a caret (^) denotes an output
argument; a hyphen (-) denotes a raw argument i.e. one that Strand should
pass directly to the foreign code. The question marks, carets and hyphens are
separated by commas. Additionally the types of the arguments may be
specified, in which case the code to convert the types from Strand to or from
C or Fortran is generated automatically. If the foreign procedure returns a
result then this may be specified by using a colon after the other parameters
have been defined. For example:

  5103 guard s_cos cos (?) C       defines a kernel with one input argument
  5104 guard s_tan tan(?,?) c      defines a kernel with two input arguments
  5105 body s_sqrt sqrt(Real?):Real^ c   defines a kernel with its first    
                                         argument an real input and it      
                                         second an output real
  5106 guard s_sin sin(-) c        defines a kernel with one raw argument


FOREIGN LANGUAGE INTERFACE EXAMPLE

Here is an example of the Foreign Language Interface used to write 6 kernels
to manipulate a database.

Firstly, the 6 'foreign' code procedures:

OPEN

Void open_db(name,dbu,res)
REF name,dbu,res

Function - Opens database file identified by name.

Returns - Address of the database file in dbu and successful completion or
returns failure message.

CLOSE

Void close_db(dbu,res)
Ref dbu,res

Function - Closes database file identified by dbu

Returns - Success or failure

DATABASE

Void is_db(dbu)
Ref dbu

Function - Checks whether the location identified by dbu is a database.

GET RECORD

Void get_rec(dbu,key,result)
REF dbu,key,result

Function - Searches in the database identified by dbu, for record identified
by key.

Returns - Record identified by key or failure if not found.

PUT RECORD

Void put_rec(dbu,key,contents,result)
REF dbu,key,contents,result

Function - Puts contents in location identified by key in the database
identified by dbu.

Returns - Successful put or an error.

DELETE RECORD

void delete_rec(dbu,key,result)
REF dbu,key,result

Function - Deletes a record identified by key in database identified by dbu.

Returns - Successful deletion or error.

The corresponding library definition file entries for these 6 procedures are:

201 body db_open open_db(?,^,^) c
202 body db_close close_db(?,^) c
203 guard database is_db(?) c
204 body db_get get_db(?,?,^) c
205 body db_put put_db(?,?,?,^) c
206 body db_delete delete_db(?,?,^) c

>From this library definition file, the program MALIS will produce the 6
Strand user kernels which will allow access to the 6 procedures from within
the Strand program.

Typical use of these kernels from within Strand would be:
db(Name,Request)                                                
        open(Name,Id,Result),                         % open requested database
        dbm(Id,Request,Result).                       % send requests

dbm(Id,Request,Result) :-
        data(Result) |                                % check ready to send
                dbm1(Id,Request).                     % next request

dbm1(Id,[get(Key,Result) | Request]) :-               % get request
        get(Id,Key,Result), dbm(Id,Request,Result).

dbm1(Id,[put(Key,Rec,Result) | Request]) :-           % put request
        put(Id,Key,Rec,Result), dbm(Id,Request,Result).

dbm1(Id,[delete(Key,Result) | Request]) :-            % delete request
        delete(Id,Key,Result), dbm(Id,Request,Result).

dbm1(Id,[]) :-                                        % close request
        close(Id,Result), dbm(Id,Request,Result)

open(Name,Id,Result):- database(Id) | db_open(Name,Id,Result).
open(Name,Id,Result):- otherwise | Fail(Result).               % error

put(Id,Key,Record,Result):- database(Id) | db_put(Id,Key,Record,Result).
put(Id,Key,Record,Result):- otherwise | Fail(Result).          % error

get(Id,Key,Result):- database(Id) | db_get(Id,Key,Result).
get(Id,Key,Result):- otherwise | Fail(Result).                 % error

delete(Id,Key,Result):- database(Id) | db_delete(Id,Key,Result).
delete(Id,Key,Result):- otherwise | Fail(Result).              % error

close(Id,Result):- database(Id) | db_close(Id,Result).
close(Id,Result):- otherwise | Fail(Result).                   % error


PERPETUAL PROCESSES AND DATA FLOW

Strand supports recursion and has recursive data structures.
The basic idea of writing recursive algorithms is to express an algorithm in
terms of itself with a set of stopping conditions. In Strand there is only
one way to express iteration and that is through writing recursive or
perpetual processes. These processes typically have the form:

                my_process(....) :-
                        G1,G2,...Gn |
                                B1,B2,my_process(....).

                .
                .
                .
                my_process(...).

In this form there are a number of clauses which cause a new instance of
"my_process" to be spawned if they are used to reduce this process. Thus the
process is defined in terms of itself. Notice there is at least one
termination clause which represents a stopping condition.

STREAMS

In Strand processes communicate through the use of shared variables, consider
the following process pool:

                        producer(C), consumer(C)

These both share a common communication channel represented by the shared
variable C. If this variable is a list and the tail of the list is always an
unbound variable, then data can flow from producer to consumer through this
data structure. An incomplete data structure of this type is known as a
Stream.


PROGRAMMING CLICHES

Programming cliches exist in all languages and are the building blocks from
which all applications are designed.

In Strand there are 6 cliches:

1. Producer/ Consumer
2. Incomplete messages
3. Bounded Buffers
4. Difference Lists
5. Short Circuits
6. Blackboards


PROGRAMMING IN STRAND

Strand is a concurrent programming language that has parallel semantics (see
portability). This does not mean however that all code written in Strand will
run in parallel automatically. If processes are to run in parallel they must
be coded in such a way that this is possible and correct (i.e. required data
available, the correct communication data is set up).

Conversely, sequential algorithms can be successfully coded in Strand if
desirable. This can be achieved by synchronization through shared data
(remembering that Strand is a single assignment language and a process will
suspend until data is available). But the affect of distributing sequential
code over several nodes will be no different than executing on a single node
machine.


PORTABILITY

Strand has parallel semantics which means that the same code can be
distributed in many different ways with no need to adapt the functionality .
Distribution is achieved by annotation which does not affect the correctness
of the program and can be tailored to achieve maximum performance results.

Alternatively, using a standard set of annotations will give adequate
performance, but complete code invariance across many configurations.

In terms of user applications, providing STRAND88 is available on the
particular hardware platform, there should be no problems in porting from one
machine to another (assuming any external code included via the Foreign
Language Interface, will port to the new machine). The user may change the
annotation of process to processor for multi processor machines. This is
possible because in between the user application and hardware there is the
SAM (Strand Abstract Machine). This is the part of the system that does the
low level emulation. A SAM must exist for the particular hardware the user
wants to use (each SAM is written in 'C' and will vary slightly from machine
to machine). If it does then running the same Strand code on the different
machines should be possible.


                 Portable Parallel Programming with Strand

                                 Abstract

Strand is a programming language for developing applications to run on a
broad range of parallel computers. It is based on an inherently parallel
model. Therefore, the results from a Strand program are invariant to changes
in a program's parallelism. This paper will introduce the Strand language and
its interface to C and Fortran. A Strand/C program that predicts
characteristics of a protein's shape is described as an example of the
utility of Strand multilingual programming.

                               Introduction

The spread of parallel processing throughout the advanced computing arena is
restricted by a shortage of software. This scarcity of software is largely
due to the fact that each parallel computer has its own native programming
tools; a situation that binds an application program to a particular machine.

Lack of portable software hurts the industry from both ends of the user
spectrum. The supercomputing end user is reluctant to adopt parallel
processing due to the costs of parallelizing their programs for each
computer. The nonportability of the resulting programs makes parallel
computing a high risk endeavor.

Further exasperating the situation is the barrier placed before commercial
software vendors who need a broad market base in order to recoup their
investments. While the combined parallel computing market is large,
individual computers have too few users to support independent software
vendors. Therefore, with very few exceptions, third party software vendors
are staying away from the parallel processing market - denying users the
parallel math libraries, data base management systems and specific
applications they require.

Only with portable, parallel software development tools will parallel
computing be able to enter the mainstream of supercomputing. Two strategies
have been used to design these tools. One approach involves extending an
existing sequential language with parallel processing constructs. This is the
approach taken by Schedule [1] and Linda [2]. Alternatively, a new,
inherently parallel processing language can be developed such as ALP [3] or
Strand [4].

Selecting the preferred strategy is a subjective decision based to some
degree on the applications under consideration. It is my opinion, however,
that the best long range choice is a new, inherently parallel processing
language. This type of language has a better chance of evolving with the
state of the art in advanced computing. As automatic parallelization
techniques become available, the importance of inherently parallel languages
will increase since they should map better into these schemes.

The only inherently parallel programming language commercially available on
a broad range of computers is STRAND88. This language runs on sequential
computers as well as distributed memory and shared memory parallel computers.
The goal of this paper is to introduce the Strand language and how it is
currently being used to develop applications. The paper begins with a brief
description of the language. This is followed by a simple example to solidify
the preceding language summary. Next, the details behind calling C and
Fortran subprograms are given followed by a program to compute the linear
convolution of two arrays. This demonstrates both the use of C functions in
a Strand program and explicit parallelism in Strand. The paper concludes with
a discussion of current Strand applications with an emphasis on a protein
structure program developed at Caltech [5].

                            The Strand Language

The key to any portable parallel processing language is its architecture
independent view of the parallel computer. Figure one displays the Strand
computational model. The central feature of this model is the pool of light
weight processes representing the state of the computation.

-----------------------------------------------------------------------------
                 Figure 1: The Strand computational model

   -------------                              -------------------
   | Strand    |----------------------------> |                 |
   | Abstract  |                              |   Process Pool  |
   | Machine   | <----------------------------|                 |
   -------------                              |        O        |
      ^  ^  ^                                 |    O       O    |
     /   |   \                                |  O              |
   -------------------                        |                 |
   |    module 1     |                        |   O      O      |
   |                 |                        |              O  |
   | proc1(...):-... |                        |       O         |
   | proc2(...):-... |                        |            O    |
   | proc3(...):-... |                        |   O       O     |
   -------------------                        -------------------
-----------------------------------------------------------------------------

A Strand computation cycle begins when the Strand Abstract Machine (SAM)
removes a process from the pool. The SAM reduces this process into more
primitive processes using the information from the Strand program. These
processes are either placed back into the pool or, if they are so primitive
as to be immediately executable, they are executed. These executable
processes are referred to as kernels.

Strand kernels fall into two classes - user defined and language provided.
Language provided kernels consist of assignment, scaler arithmetic, and other
low level operations. User defined kernels are Fortran and C routines
provided by the user. The interface between user defined routines and Strand
is discussed in more detail in a later section.

In all cases, a process suspends until all of its required data is available.
Hence, Strand can be viewed as a data flow language.

As mentioned earlier, Strand processes are reduced according to the rules
from the Strand program. The Strand program consists of a collection of
modules containing Strand procedures. Hence we must understand the syntax of
a Strand procedure. A full description of Strand's syntax is available in the
textbook Strand: New Concepts in Parallel Programming [4]. Only a summary is
given here.

A Strand procedure consists of a collection of clauses with the same name.
The form of a clause is:

     H :- G1, G2, ... GM | B1, B2, ... BP.

H is the head of the clause. It contains the clause's name and the procedure
arguments. G1 through GM are the optional guard kernels which provide tests
of the procedure arguments. Finally, if the head matches the process being
reduced, the guards all succeed, and all of the required data is available,
then the optional body calls, B1 through BP, are activated. The body calls
can be other Strand procedures or body kernels.

Strand variables are single assignment and are indicated by an initial
uppercase letter. Processes share variables to communicate with each other.

Strand's simple data types are integer, real and string. These are self
explanatory. The compound types are the list and the tuple.

The list is defined as a collection of comma separated items:

     [1, 4, 2.0, "a string", Variable_1]

Lists elements are accessed using a head - tail notation:

     [H1 | Tail]

where H1 refers to the first element of the list and Tail to the rest of the
list.

A tuple is a more general form of a list. While a list is processed
sequentially, the elements of a tuple are accessed randomly. Tuples are in
some ways simpler than lists but will not be considered further in this
paper.

The computational model underlying Strand is inherently parallel. STRAND88,
however, does not automatically distribute processes across the nodes of a
parallel computer. The programmer must explicitly assign a process to a node.
For example, the body call to invoke the process, conv, on node N, is:

     conv(Argument_list)@N,

The node address can be a variable, a literal or in some cases a virtual
address such as next or previous. The syntax of interprocess communication
does not depend on whether the processes reside on the same node or not.
Hence, the result of a Strand program does not change as the explicit
parallelism is modified.

This discussion of Strand's syntax will be much clearer after considering the
following example.
                         Example: List Membership


Consider the procedure member. The procedure's three clauses define the
search of an input list for the indicated KEY. If it finds KEY the variable
Status is set to the string "ok" . Otherwise, Status is set to "fail".

The Strand Abstract Machine, SAM, would select the member procedure to reduce
a process of that name. The head of the procedure's first clause will
successfully match to a process with a non empty list for its second
argument. Note that the input list is split into its head and tail during the
matching. If the guard succeeds, Status is set to ok and the procedure
terminates.

If the head matches but the guard from the first clause fails, the second
clause is attempted. If this guard succeeds, the member procedure is
recursively invoked on the tail of the list.

Finally, if the second argument of the process being reduced is an empty
list, only the head of the third clause can successfully match. This
indicates that the key was never found in the list and Status is set to the
string fail.

-----------------------------------------------------------------------------
                  Figure 2: The list membership procedure

member(Key, [Head | Tail], Status) :-
     Key == Head |
          Status := ok.
member(Key, [Head | Tail], Status) :-
     Key =/= Head |
          member(Key, Tail, Status).
member(Key, [], Status) :-
     Status := fail.
-----------------------------------------------------------------------------

This discussion implied that the Strand Abstract Machine considered the three
clauses in sequential order. This is not required by the language. The
programmer just provides for all cases that can arise during process
reduction and does not have to worry about the order of the clauses.


                      The Foreign Function Interface

STRAND88 is a complete language providing all the constructs needed to
develop interesting applications. It would be of limited practical value,
however, if it could not call routines written in C and Fortran. Many
parallel algorithms contain large sequential segments. Hence, an effective
programming strategy is to embed sequential program fragments in the parallel
Strand program. This is easy to express with Strand as any body or guard
kernel can be replaced by a call to a C or Fortran routine.

To call a Fortran or C routine from Strand, the programmer must build a SAM
executable image that includes the foreign code. This is done using
information from a user created library definition file. This file defines
the mapping between Strand and the foreign code including procedure names and
argument types. Once the new SAM has been built, foreign routines can be used
just like any other Strand guard or body kernel.

STRAND88 includes a library of macros and functions to manipulate Strand data
types within Fortran and C routines. One can also create user defined,
foreign data types that can be passed between processes through the shared
variable mechanism.

The following example demonstrates the use of the foreign function interface
with a parallel algorithm.


                   Example: Parallel Linear Convolution

Linear convolution [6] is a fundamental algorithm of digital signal
processing. This section of the paper presents a procedure that returns the
linear convolution of two equal length arrays. If the two input arrays, X and
H, are of length NX, then the output array, Y, is of length (2*NX-1).

The algorithm is easily parallelized since each value in the convolution is
computed independently. A C function was written that computes a block of
convolution values starting at a given index. Therefore, each node is given
a different block of the convolution to perform. All nodes use the same C
function. The function declaration is:

     void conv(ind, size, n, nh, h, nx, x, ny, y)
     int ind, size, n, nx, ny, *ny;
     double h[], x[], *y[];

where the output values, ny and y, are pointers to the appropriate type. This
function will return in y the next size results of the convolution starting
with y [ind] . The Strand procedure call that maps to this function is:

     conv(Ind, Size, N, H, X, Y)

where H, X and Y are lists. Notice that each Strand list maps into two values
- an integer representing the length of the array and a double array.

Figure 3 presents the code for a procedure called convolution. This procedure
will fill the array Y with the convolution of the two input arrays, X and H.
An array in Strand is represented as a list of real values (though the
experienced reader may notice that Y is a list of lists - one for each block
of the convolution).

The input to the procedure is the number of elements to use from X and H
(NX), the requested number of nodes (Nodes), and the lists representing the
input arrays (X and H) . The first clause of the procedure computes the block
size for the convolution and the actual number of nodes to use. Notice that
// is the modulus operator. When all of the data is ready, the procedure that
does the convolution (conv1) is activated.

The conv1 procedure consists of two clauses. The first clause handles all but
the last block of the convolution and recursively calls itself to initiate
computation for subsequent blocks. These are the fixed blocks of size Size.
The second conv1 clause handles the last block which may have a different
size. Both clauses invoke a Strand procedure, worker, that calls the C
function.

This example contains a great deal of information. To appreciate its
operation, remember that the procedures all run concurrently and, except for
data flow synchronization, independently. The node addresses wrap around for
cases where there are fewer nodes than requested by the algorithm.

-----------------------------------------------------------------------------
                 Figure 3: Parallel convolution procedure.

     convolution(Nodes, NX, X, H, Y) :-
               NY is 2 * NX - 1,
               ModNY is NY // Nodes,
               comp_bsize(Nodes, NY, ModNY, Size, NodesToUse),
               conv1(NodesToUse, 0, NY, Size, NX, X, H, Y).

     conv1( Node, Index, NY, Size, NX, X, H, Y) :-
          Node > 0 |
               NewInd is Index + Size,
               NewNode is Node - 1,
               worker(Size, Index, NX, X, H, YHere)@Node,
               Y := [YHere | YRest],
               conv1(NewNode, NewInd, NY, Size, NX, X, H, YRest).
     conv1(0, Index, NY, Size, NX, X, H, Y) :-
               Length is NY - Index,
               worker(Length, Index, NX, X, H, Y)@0.

     worker(Length, Index, NY, X, H, Y) :-
               conv(Index, Length, NY, H, X, Y).

     comp_bsize(Nodes, NY, O, Size, NodesToUse) :-
               NodesToUse := Nodes,
               Size is NY / Nodes.
     comp_bsize(Nodes, NY, ModNY, Size, NodesToUse) :-
          ModNY > O |
               Size is 1 + NY / Nodes,
               NodesToUse is NY / Size.
-----------------------------------------------------------------------------

                               Applications

STRAND88 is being used for a number of research projects ranging from
numerically intensive applications such as weather modeling [7] and protein
structure determination [5] to abstract pattern matching problems such as the
human Genome project [4]. This section will focus on the protein structure
work.

The three dimensional shape of a protein is a critical property for
understanding the function of a protein. Full determination of protein shapes
is a vast problem taxing the abilities of the fastest supercomputers. A
valuable preliminary step is to predict where the protein folds into a
particular shape known as an alpha helix.

Two Caltech chemists, Joe Bryngelson and John Hopfield [5] developed a
sequential C program that learns how to predict alpha Helix locations. The
program learns how to predict alpha helices for new proteins by carrying out
a large nonlinear optimization calculation with known protein structures.

Stephen Taylor and Sam Southard [5] parallelized this sequential program
using STRAND88 to express the parallel parts of the algorithm. 75% of the
original C code was used in the final parallel program. Furthermore, Strand's
management of the parallelism was very flexible allowing easy experimentation
with different schemes for parallel execution.

The program displays linear speedup with up to 32 nodes. The drop off from
linearity at 64 nodes is an artifact of the protein data set divided among
64 nodes. A larger data set would allow the program to show linear speed up
beyond 32 nodes.

The logic required to manage parallelism inevitably carries a cost. The cost
incurred by the Strand/C program is small and is overcome by the time the
second node is added.

                                Conclusion

The Strand language was first offered commercially in the summer of 1989. In
a relatively short time it has established itself as a powerful parallel
programming language.

Strand applications research is progressing at an increasing rate as the
number of users grows. It is proving to be a useful language in the
numerically intensive areas with current research projects in seismic signal
processing [8] and computational fluid dynamics.

While the parallel processing industry is still a long way from adopting
programming language standards, it is hoped that STRAND88 will play a key
role in filling this void. To this end, future work with Strand will focus
on both improving the language and on building a large collection of
applications.


                                References

[1] J. Dongarra and D. Sorenson. "SCHEDULE: Tools for developing and
     analyzing parallel Fortran programs" in The Characteristics of Parallel
     Algorithms, MIT Press, 1987.

[2] W. Leler. "Linda Meets Unix", IEEE Computer, Feb. 1990, page 43.

[3] P. Vishnubhotla. "Concurrency and Synchronization in the ALPS Programming
     Language". Ohio State University Technical Research Report,
     OSU-CISRC-12/89-TR56, 1989.

[4] Ian Foster and Steve Taylor, Strand: New Concepts in Parallel
     Programming, Prentice Hall, 1990.

[5] S. Southard Jr. "A Parallel Algorithm for Alpha Helix Prediction", Draft,
     March 12, 1990.

[6] A. V. Oppenheim, R.W. Schafer. Digital Signal Processing. Prentice Hall,
     1975.

[7] I. Foster, R. Overbeek. "Experiences with Bilingual Parallel
Programming".  Presentation at the Fifth Distributed Memory Computing
               Conference, 1990.

[8] T. G. Mattson, K. Steer. "The Seismic Signal Processing Workbench", SSTI
     Internal document, 1990.